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Abstract. The proofs of the probability density function (pdf) of the 
Wishart distribution tend to be complicated with geometric viewpoints, 
tedious Jacobians and not self-contained algebra. In this paper, some 
known proofs and simple new ones for uncorrelated and correlated cases 
are provided with didactic explanations. For the new derivation of the 
uncorrelated case, an elementary direct derivation of the distribution 
of the Bartlett-decomposed matrix is provided. In the derivation of the 
correlated case from the uncorrelated one, simple methods including a 
new one are shown. 
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1 Introduction 


The Wishart distribution has been often used for the matrix of the squares and 
cross products of random vectors. In multivariate analysis or more specifically 
structural equation modeling (SEM), a modified log-likelihood of this distri- 
bution (see e.g., Ogasawara, 2016, Equation (2.8)) has been used probably as 
a gold-standard discrepancy function for estimation even under non-normality 
though the distribution is given under multivariate normality. In SEM, varia- 
tions of the distribution are also used as priors for covariance matrices (Liu, Qu, 
Zhang, & Wu, 2022; Zhang, 2021). The distribution has various extensions e.g., 
the inverted distribution (Anderson, 2003, Section 7.7), singular cases (Bodnar 
& Okhrin, 2008; Mathai & Provost, 2022; Srivastava, 2003), complex-valued ones 
(Srivastava & Khatri, 1979, Section 3.7; Mathai, Provost, & Haubold, 2022, Sec- 
tion 5.5), those with two different degrees of freedom (df’s) (Ogasawara, 2023b), 
the joint distributions of the Wishart matrix and normal vectors (Yonenaga, 
2022) and cases under arbitrary distributions (Hsu, 1940; Srivastava & Khatri, 
1979, Lemma 3.2.3; Olkin, 2002, Section 2). 

Asymptotic results associated with the Wishart distribution are also of prac- 
tical use. In SEM, the asymptotic standard errors of the Wishart maximum 
likelihood estimators for structural parameters are often used under normality 
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or non-normality. In this situation, the large df is assumed. When the number 
of variables is also large under some condition as in high-dimensional data (see 
e.g., Yao, Zheng, & Bai, 2015), the limiting distribution of the eigenvalues of the 
Wishart matrix is given by the Maréenko and Pastur (1967, M-P) distribution 
(the author is indebted to an anonymous reviewer for this point). The M-P dis- 
tribution gives a tool for the problems of the numbers of factors or components 
in SEM (Chen & Weng, 2023). 

The probability density functions (pdf’s) of the Wishart distribution were 
given by Fisher (1915, p. 510) and Wishart (1928) for the bivariate and general 
multivariate cases, respectively. The derivations tend to be involved with geo- 
metric viewpoints (see e.g., Anderson, 2003, Section 7.2) or not self-contained 
algebra as criticized by Ghosh and Sinha (2002) (for the references of deriva- 
tions see Srivastava & Khatri, 1979, p. 73 and Anderson, 2003, pp. 256-257). 
Khatri (1963) showed a brief derivation using an integral of the unity over the 
constant quadratic forms having the chi-square density. Ghosh and Sinha (2002) 
gave a self-contained concise proof of the Wishart density though it is an indirect 
method. In spite of frequent use of the Wishart density and its variations in SEM, 
the derivation of the pdf seems to be often intractable for beginning students/re- 
searchers. Probably, many of them use the Wishart pdf as if referencing a cook 
book without understanding the derivation, which is an undesirable situation. A 
relatively concise derivation is to use the characteristic function and its inversion 
(Wishart & Bartlett, 1933; Wilks, 1962, Section 18.2). However, this method re- 
quires the Fourier integral theorem or Levy’s inversion formula, which may be 
unfamiliar for beginners. In this paper, almost self-contained known proofs and 
new ones for the uncorrelated and correlated multivariate cases are shown with 
didactic explanations. 


2 Proofs of the Wishart Distributions 


2.1 The distribution of a lower-triangular matrix for the Wishart 
density 


Suppose that in the random matrix X = {X;;} (@=1,...,.p;57 =1,...,.n;p <n), 
each column is multivariate normally distributed as N,(0,1,) independent of 
the other columns with the population mean vector 0 and covariance matrix I, 
denoting the p x p identity matrix. That is, all the elements of X are mutually 
independently distributed as standard normal. 

Let S= XX? = TT? be Bartlett-decomposed such that T is a p x p lower- 
triangular matrix whose diagonal elements are positive. Define s = 
(si, S21, 522, +++) Spl, ess Spe) and t = (t11, tai, t22, wy tpt, seey top)” where s and 
t are the {(p? + p)/2} x 1 vectors of the non-duplicated elements of S and the 
random elements of T, respectively. Let |Os/Ot™|, (Srivastava & Khatri, 1979, 
p. 28) be the absolute value of the determinant of the Jacobian matrix for the 
transformation S — T: 


Os O8i; : : 
mec >i>g>lip>k>l>l 
mT {Fhe i292 Lp ) 
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using the double subscript notation for the rows of the elements of S and columns 
for those of t in Os/Ot?. Then, the Jacobian of the transformation is given by 
\Os/Ot™|,. For the proof of the Wishart distribution, the following lemmas are 
used. 


Lemma 1 Suppose that each of 2m variables X;, and Xj, (i A j;k =1,...,m 
m = 1,2,...) independently follows N(0,1) = _ a Then, the distribution of 


y-1 Ache is the same as that of Xqa/>), 4 X72, (629; L—1,..,m). 


Proof. When m = 1, the equal distribution of Xi X51 and Xi XF, = Xi1|X51| 
is given by the symmetric distribution of X;1Xj, about zero. For general cases, 
consider the moment generating functions (mgf’s). By definition, the mgf of 


Pa XipX jk is 
E {exp (t pea Xin-Xye)} = [pea E {exp(tXinXjx)} 
x x? 
=i x a ee exp a - “F)den exp (- is) dziz 
(2, I, oe exp {tf Vay exp {Ue} a, 
V4 = = 0 Pp 2 Lik CXP 2 Vik 
1-1?)2? 

= Tz fo exp {Gch} dass 
= (1=2)-™? (|e) <2). 


On the other hand, the mef of Xin4/>opi X7;, is 


Bexp (Xu a a) 

_ 1 oe) ioe) m 2 x, one Fp 

= nym tD/2 —_ 7 ae exp (teay Dl UR DT 2 ) 
xdajdz 51 a -dzjm 

= aes ae fe aaj exp { (xu ti =) ‘2h dx jy 
xexp {= qa-2) 7 103, /2} dei say dite 

= On? ft. “3 a ae exp {-a —?) ei 23/2} dxj1++-d&jm 

= (l= )-™ (lt) <1), 


It is found that the above two mgf’s are the same, which shows the same distri- 
bution of yy XipX jk and Xity/ bee X4,, (i x a; i= 1, vy), 


The second proof using the pdf of the chi-distribution is given in the supple- 
ment to this paper (Ogasawara, 2023a). 
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Lemma 2 (Deemer & Olkin, 1951, Theorem 4.1; Srivastava & Khatri, 1979, 
Exercise 1.28 (i); Muirhead, 1982, Theorem 2.1.9; Anderson, 2003, p. 255). The 
Jacobian of the transformation S > T is 


Ty), =2°7[° poo 
|ds/AtT |, = 2 IL, % ; 


Proof. Deemer and Olkin (1951) derived the result as a special case of another 
general theorem. Muirhead (1982) used the exterior product while an essential 
standard proof was given by Anderson (2003). The derivation is given here by 
induction. When p= 1, \Os/Ot™ |. = ds11/dti1 = dt?, /dti = 2t141 > 0 show- 
ing that the above Se holds. Assume that the result holds when p = p* 
e., |08/Ot"|4.— 2° [[e_ ee “tl (y* > 1). When p = p* + 1, the elements 
Spxt1,1s Spx41,25 +++) Spx+1,pxt1 are added to s at its end. Similarly, 


tpat1,1) tpe-+1,2) ++ tpx-+i1,px+1 are added to t?. Noting that s;; = > tintjr (p > 


i> j> 1), we find that 0s/Ot™ is a lower-triangular matrix. Consequently the 
added factor in |Os/Ot™|, when p = p* + 1 over when p = p* is given by the 
product of the added diagonal elements: 


OSpx41,1 OSpx+1,2 OSpx+1,p+ OSpx+1,px+1 


= ty te9°--+t ot il i 
p*p Pp »P 
Otpx+1,1 Obpx+1,2 Ot px -+1,px Otpx-+1,px+1 


That is, |Os/0t™|, becomes 


a BP oy* 541 177? +) yp*41-i41 
2P (IT ti ) tater Pipi 2 ~ [ thi ; 


i= y 


which shows that the formula |@s/dtT|, = 2? []?_, t?2-'** holds when p = p*+1 
indicating the required result. 


In the following theorem for a known Wishart density, we use I,(n/2) = 
mP(P-V/4 TTP P{(n —i+1)/2} ie., the p-variate Gamma function (Anderson, 
2003, Definition 7.2.1; Subsection 7.2, Equation (18); a also BLM, an Sec- 
tion 35.3, https: //dlmf .nist.gov/35.3), where '(k =e 1 exp(—v)du (k > 
0) is the usual gamma function. 


Theorem 1 Under the condition that the n columns of X independently follow 
N,(0,I,), the pdf of the Wishart distributed S is given by 


exp{—tr(S)/2}|$|-?-V? 
anp?2T,(n/2) 


Proof. Consider the case of tj; = Xj; and ty = />),_,X2,(@ = 1... p59 = 
1,...,i—1). Since Xj; (¢ = 1,...,p;7 = 1,...,n) are mutually ania ty («= 
1,...,p;j = 1,...,7) are independent. Note that (TT™);; = ss 1, = (XX), 

(i = 1,...,p) are independently chi-square distributed with n df, where (-);, 


is the (7,7)-th element of a matrix; and t,; is chi-distributed with n —i+1 


Wwp(S|Ip,n) = n> p). 
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df. Further, Lemma 1 shows that the distributions of the off-diagonal elements 
1,...,p;j =1,...,4 1) are the same. That is, the distribution of S = KX? and 
TT" are the same when ¢;; (i = 1,...,p;j = 1,...,i) are distributed as above. 
The pdf of the constructed t,;’s (p > i > 7 > 1) denoted by f,(T) becomes 

Z te exp(—t?,/2) 
i eee mee (ee er wa 


ww, I 4/2) 


{firteso(-1t/0} | II on (4/9) 


p>i>jzl 


_ - Pp 
9 ate ppt) B sg ge Dee ll) Il {(n —4 + 1)/2} 
i=l 


(11 ar") exp{—tr(TT™) /2} 
272 —?L,,(n/2) 


In the above expression, the pdf of the chi-distributed t;; with k df denoted by 
fy (tlk) is given by that of the chi-square distributed u = t?, with k df i.e., 
y(k/2)-1 


fialulk) = SET (R/D) exp(—u/2) with the Jacobian du/dt;; = 2t;;, yielding 


ylk/2)—-1 du eer exp(—t?,/2) 


a 


fe (tiilk) = HPT (KD exp( my) ~ 90-FD2-1PT(n — 7+ 1/2} 


as shown earlier, when u = ¢?, andk =n—i+1. 
Consider the transformation T > S in S = XX? = TT’. The Jacobian 
J(T — S) of this transformation is given by the reciprocal of J(S > T) obtained 


in Lemma 2 as J(T + $) = 1/|0s/dt™|, = (2°71, 


t=1 “Ut 


using |S|!/? = |T| =t,1--+tpp the pdf of S becomes 


-1 
) . Consequently, 


w,(SIL,,.n) = fy(L) I(T > 8) 
7 (TT? ty *) exp{—tr(TT™)/2} > exp{—tr(S)/2}|S|(v-P?-)/2 


t=1 “44 
~ PPT ee anp/?T,(n/2) 


i= 1 “Ut 


Remark 1 The pdf of t;;’s (p > i > j > 1) ie., fp(T) given above using Lemma 
1 is algebraically equal to those of Anderson (2003, Equation (6), p. 253, Corol- 
lary 7.2.1), Wijsman (1957, Equation (12)) and Kshirsagar (1959, Remarks). 
However, a typical derivation by e.g., Anderson is an indirect one using or- 
thogonalization and the conditional normal density. Since Anderson’s derivation 
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seems to give some complicated impressions for beginning students/researchers 
though it is almost self-contained, the corresponding didactic explanation of his 
derivation is given below. Anderson (2003, Equation (2), p. 252) defined the 
n-dimensional independent random vectors v; ~ N,(0,I,,) (¢ = 1,...,p) with 


Then, the Gram-Schmidt sequential orthogonalization is employed (Anderson, 
2003, Equation (3), p. 253) as 


wiv 
= g Xt _ 
wi=Vvi- y Wty, (b= AP) and w, = Vi, 


where he used the expression vj Wj for the denominator Ww) W;. Though vj Wj = 
Ww} Wy (j = 1,...,7) as will become apparent, Ww) Wy may be more natural and 
appropriate. While he included the short derivation of the orthogonality among 
w;’s by induction, it is repeated here with some added explanations. When 7 = 


2, we have 


Lee: TT Tt T T -1,,.T 
Ww, Vo} Wi = V2 Wi—-V2 wi(w) wi) WwW, w, =0 


ws Ww, = {v2 — wi(w, wi) 
showing the orthogonality. Suppose that 

wrw, =0(j,k=1,..,¢-1; j#&) 
hold. Then, we have 


i-1 wily; i-1 oes 
Leaner D ; Pease: meas eT eri em a 
Ww, WwW; = Ww, (v- Sti = Wee 2d 


j=l wjWw; j Ww; Wj 
T 
Ww, Vi 
T T k tt y : — ; 
= W;, Vi — Wy We = 0 (1 = 2, ...,p; K=1,...,4—-1), 
k Wk 


due to the assumption Ww) We =0 (j,k =1,...,i-1; 7 #k), showing the re- 

quired result Ww} We =0 (j,k =1,...,4; 7 #k). Recall that vj Wj = wi) Wy G= 

1, ...,¢) mentioned earlier, which is obtained by Ww) We =0 (j,k =1,...,4; J #k) 
i-1 T 


=Vvi- (wi, ie w;_1)diag{(wiw,)71, ais (wi wii)! }(wi, sbdig wi_-1)' Vv; 


go Vi (In — Pw,_.)vi = Qw._. Vi (i = 2, aD) 
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where Pw,_, = Wi-1(Wj_,Wi-1)7'W}_, is the idempotent (ic., Py, , = 
Pw,_,) and symmetric projection matrix transforming or projecting v; onto 
the space spanned by the columns of W;-1 = (wi,...,wi-1) of full column 
rank by assumption; and Qw,_, = In — Pw,_, is also an idempotent and 
symmetric projection matrix yielding the residual vector v; — Pw,_,v; or the 
projected vector on the space orthogonal to the column space of W;_; with 
vi = Pw,_, vit Qw,_,vi. Anderson (2003, p. 252) stated that “w; is the vector 
from v; to the projection on w}j,...,w ;-1” with his Figure 7.1. He repeatedly 
stressed the equivalence of the column space of W;_1 and that of vj,...,vj—1 in 
our expression. 

Using the constructed wj,...,wj—-1 by the Gram-Schmidt orthogonalization 
or projection, Anderson (2003, p. 252) defined 


ty = |lwil| = 4/wiwi = 1,-4.9) 


ty = Ve W3/ |lwyl| = 2,505 9 = 1,441), 


and 


which may be uniformly expressed by t;; = vj wj/||w,|| = (¢ = 2,....p; 7 = 
1,...,7) due to Vj Wj = wi} Ww; (j =1,...,¢) mentioned earlier. Then, noting that 
w; = vi — Pw,_.,vi, we have 


i-1 T a T a 
Wjw; Ww; Vi ds 
v, = wit Pw,_.vi = wit 5 m iy, = S T ae? = S T A? 
$a GS fy IN oa | 


J tik T tik J 
= WwW, Wr = tintjr preir>jel 
2 well * * ffwall ee | ) 


(Anderson, 2003, p. 252). nvi = \> ie The (@@ = 1,...,p), w;/|lwyl] G = 
j=1 J 
1,...,i — 1) is seen as the unit-norm vector representing the direction for the 
j-th coordinate in the i — 1 coordinates given by wj,...,wj 1. He stated that 
“tij,J = 1,...,7— 1 are the first i — 1 coordinates in the coordinate system with 
Wi, ..., W;_-1 as the first coordinates axes” (p. 252). We also find that t;; is || w,|| 
times the regression coefficient b;; for v; on w; since 


tig = Ve Wy/||wy|| = (vi wy /w; w; ) [Iw 
The properties of the normality of ti; = v/w,/||w;|| (i = 2,..p;9 = 


1,...,¢—1) and their mutual independence shown by Anderson are based on the 
normality of the conditional distribution of the multivariate normal when w;(j = 
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1,...,¢— 1) are given and orthogonal transformation in t;; = v/ w;/||w,|| (¢ = 
2,..,p;j = 1,..,i—1). That is, the standard normally-distributed variables 
tij = Vv; w;/||w,|| do not depend on wy,..., Wi—1 indicating independence with 
(w;/||w; ||)" we/ ||wel] = doje Gk = 1,...,4- 1), where 5;, is the Kronecker 
delta with 6;; = 1 and 6;, = 0(j9 # k) (Anderson, 2003, Theorem 3.3.1). 
1/2 
jai ig 
Although the same result as shown above by the didactic explanation of Ander- 
son’s derivation is directly given by Lemma 1, the two methods may be insightful 
with compensatory properties. [end of Remark 1] 


The independent property of t;;’s is given by ty = {(Xx"),, - a Seat t? } 


2.2 The Wishart density for general correlated cases 


For the correlated cases, four lemmas are provided. Lemma 3 is for three Jaco- 
bians in the product of two lower-triangular matrices, where the first Jacobian 
was used by Anderson (2003, Theorem 7.2.2) to derive the Wishart density for 
general correlated cases while the remaining two are given for generality with 
didactic purposes. Lemmas 4 and 5 are provided for the Jacobians in two alterna- 
tive derivations of the general Wishart density. The proof of Lemma 6 associated 
with sufficient statistics is based on Ghosh and Sinha (2002). 


Lemma 3 Suppose that A = BC, where A, B and C are px p lower-triangular 
matrices. Consider the variable transformation from the non-zero elements of 


C or B to those of A. Then, the Jacobians J(C > A) and J(B > A) are 
sF23 ape 
TT’, ¥,|7* and IT? | , respectively. When B = C, J(B 3 A) = 


i= 1 Ui i=l “ti 


[Tes ee (bis + bp) 


Proof. Note that Anderson (2003, p. 254) gave J(C — A). Since aj; = >),—; bikck; 
(p >i>Jj > 1), we have 


ai bi4 00 --- 0 0 C1 
a1 * boo QO --- 0 0 C21 
a22 * OK boo --- 0 0 C22 
* = 4 
Api * Ok Ox bpp «++ O Cpl 
Gown KO OK ree kK es Dyn Cpp 


where the diagonal element of the lower-triangular matrix corresponding to the 
row for a;; and the column for cj; is bi; (p > i > 7 > 1); the asterisks indicate 
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zero or non-zero elements; and 


[2 | [ cu 0 0 0 0 ] [>| 

Qa21 * C11 0 0 0 ba 
a22 * * CQQ *°° 0 0 bo 
Api * eK ees Cee: O bp1 
App * ok OX * Che bpp 


where the corresponding diagonal element for a;; and b,; is cj; (p >t > Jj > 1). 
Since the inverses of the Jacobian matrices for J(C > A) and J(B > A) on 
the right-hand sides of the above equations are lower-triangular, the Jacobians 
become the reciprocals of the absolute values of the determinants i.e., []}?_, bt; 
and [[?_, oor respectively. The result when B = C is obtained by the recip- 
rocal of the determinant of the sum of the two lower-triangular matrices. 


Lemma 4 Suppose that A = BCB", where A and C are p x p symmetric 
matrices; and B is a lower-triangular matrix. Consider the variable transforma- 
tion from the non-duplicated elements of C to those of A. Then, the Jacobian 
J(C > A) is |BIZ?*”. 


Proof. Since the non-duplicated elements of A using its diagonal and infra- 
diagonal elements are aj; = D>, )-J—1 Dincnibj: (p > i > j > 1), we have 


0ai; 
oO = binds peisagelk=1,..,4,1=1,..,9), 
Ochi 
which gives 

ai biib11 O DO +s O =++ O C11 
a21 * boeb1,0 +: O +: O C21 
a22 7 * bogbo2 +++ 0 0 C22 
Gp1 * *  # +++ Bypbi - ++ 0 Cpl 
App * kx tek : bop bpp Cpp 


where the diagonal element of the lower-triangular matrix for a;; and cj; is 
0aj;/Oci; = bubj; (p > i > 7 = 1). Since J(C — A) is the reciprocal of the 
absolute value of the determinant of the above lower-triangular matrix, we obtain 
J(C > A) =1/ ITT? nt = Bie, 


t= 1 Ui 


Lemma 5 Suppose that A = BCC'B!, where A is a px p symmetric matrix; 
and B and C are lower-triangular matrices. Consider the variable transforma- 
tion from the non-zero elements of C to the non-duplicated elements of A. Then, 


the Jacobian J(C > A) is [Bey OPE ee hs 


t= 1 1 
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Proof 1 The diagonal and infra-diagonal elements of A are employed for its non- 
duplicated ones without loss of generality. Then, define a = 

(a11, 21, 4225 +++) Ap1; +++) eg and c= (ci, C215 C225 +29 Cpl) very Gai with the el- 
ements lexicographically ordered. Since B, C and BC are lower-triangular, the 
Jacobian matrix 0a/OcT = {0a;;/Ocki} (p >i > 7 > ljsp>k >l> 1) becomes 
lower-triangular. This can be shown by 


804 — {B(E,C? + CE,)B"};; = (BExCB"),; + (BCE,B"*),; 


Och 


= biz(BC) 51 + (BC) ubjx (p Si>jrl p= k>Il> 1), 


where E,,; is the matrix of an appropriate size, whose (i, 7)th element is 1 with 
the remaining ones being 0. The right-hand side of the last equation in the above 
expression vanishes when 7 < k or {i = k} N{j < 1}. This condition indicates the 
lower-triangular form of da/Oc? = {0a;;/Ocj,}. Then, the diagonal elements are 


0aj; ? : 
oe = {B(E,;C* + CE;;)B"}i; = (BE CO" B"),5 = bycjjb;3 (p >t > GF > 1) 
ij 
and 5 
oo = {B(E,,CT + CE,;)B"} i; = 2b? ci (é =1, sey (D) 
Cit 


Since the determinant of the Jacobian matrix for J(A — C) is 


p a Oaiz __ v7) i-1 Oai; P Oa _ 9p TTP a ne ae 
i Tea auf = ( ima jan Bei; int dew = 2? [Ting [Tjas bite 333 


=P (at ibut. 2, =F [Lae a, 


4=1 Vii ga 79 {= 1 “ti 44 
os +1 77P =—t+1 
= 2|BPM [Ti i 


the Jacobian J(C — A) is the reciprocal of the absolute value of the above 
quantity: 
— pitt); lop TT? —it1 
HC A) = [BEY / TT. ok 


9 


which is the required result. 


Proof 2 The transformation A = BCCTB? is seen in two steps. In the first 
step, the transformation C —> CCT is considered, whose Jacobian is given by 
Lemma 2 as J(C ~ CC") = 1/ 2 a ae, The second step is for the 
transformation CCT > A = BCCTB! with the Jacobian J(CCT > A) = 
IB e™, which is given by Lemma 4. Then, the Jacobian J(C > A) is the 
product of the two Jacobians due to the chain rule, which gives the required 
result. 


Suppose that each column of a px n matrix Y follows N,(0, =) with positive 
definite © independent of the other columns. Recall X in Theorem 1. Let % = 
BB? be the Cholesky decomposition, where B is a fixed lower-triangular matrix 
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whose diagonal elements are positive for identification and convenience. Then, 
each column of Y = BX independently follows N,(0,¥). Define Sy = YY? = 
BXX™B" = BSB", where S = Sj, = XX? = TT’, and the {p(p + 1)/2} x 
1 vector sx = (ssi, $321, 8¥22, ++ SEply ++) SEpp) with Sys = {sxij} (i,9 = 
Lysis): 


Lemma 6 Define positive definite ©; = B;B} and Ss, = B,SB; (= 
where S is as before. Denote the pdf’s of Sy, at Sy by gx=», (Sx) (i 
Then, 


| 
ao 
bo bb 
~~" 


gn=, (Ss) _ @pn(¥|0, &1) 
gs=,(Sx) — ¢pn(¥|0, Ye)’ 


where dpn(Y¥|0, Yi) = TTj-1 bp{(Y). ;|0, Di}; (Y).5 ts the j-th column of Y; 

and 

exp{—(¥) 52," (¥). ;/2} 
(2n)™/?|33,|4/? 


bp{(Y).5|0, Bi} = (¢=1,2; 7 =1,...,n). 


Proof. The derivation is given by the factorization theorem for the sufficient 
statistic corresponding to Sy for © as used by Ghosh and Sinha (2002, Equation 


(8)): 
bpn(¥|0, Di) = gu=»,(Sz=)h(Y) (¢ = 1,2), 


which gives the required result. 


The Wishart density for general correlated cases (see e.g., Srivastava & Kha- 
tri, 1979, Theorem 3.2.1; Anderson, 2003, Theorem 7.2.2) is derived in different 
ways. 


Theorem 2 Let each column of apxn matrix Y follows N,(0,X) with positive 
definite & independent of the other columns. Then, the pdf of Sy = YY" is 


exp{—tr(—!S3) /2}|Ss|"-?-)/? 


> = 
wp(Sz|¥, 7) 2np/2|S|"/2P, (n/2) 


Proof 1 Consider the transformation T + Sy = BTT™TB™. The Jacobian is 
given by Lemma 5, when A = Sy, B = B and C = T with added restrictions 
b;; > 0 and t;; > 0 (é = 1, vey P) as 


I(T > Ss) = |BJ-@+ / (2° rr’ a = [pr etn/2, (2° Hc oe) 


i=l a 
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The pdf of T denoted by f,(T) was given by Theorem 1. Then, we have 


w,p(Sy|X,n) = fp(T)J(T > Ss) 
P 
exp{—tr(TT™) /2} et ti |d|-(@+1)/2 
ar) T,(n/2) | ]P_, 


p 
exp{—tr(TT™)/2}|5|-@+/? T] P 
i=l 


ap ?2T(n/2) 
7 exp{—tr(B~!SsB™~!)/2}|| (P+))/2)B ISBT if p—1)/2 
7 anv/2T(n/2) 


exp{—tr(=—!S3;)/2}|Ss|"-P-D/? 
2np/2|31|"/2T,(n/2) 


where tr(B~'Ss;B™~!) = tr(B™-!B“!Ss) = tr{(BB™)—!Ss} = tr(X71Ss) 
and |B~!Ss;B™+| = |Ss||=|~1 are used. The last expression gives the required 
result. 


Proof 2 Employ the two-step transformation T > S = TTT > Ss = BSB". 
The first step was used by Theorem 1. The Jacobian J(T + S = TT") in the 
first step is given by Lemma 2 by taking the reciprocal of the last result of the 
lemma while J(S > Ss = BSB?) is obtained by Lemma 4. That is, 


Wp(Sy|¥,n) = f,(T)J(T > $)J(S > Ss) 
—tr(S)/2}|S|-P-Y/2 
_ exp{—tr(S)/2}|S| (8+ Sy) 
2np/2T,(n/2) 
Z exp{—tr(8)/2}18/P 97 oysy 
2np/2T,(n/2) 
exp{—tr(—!Ss) /2}|H-185|@-P-Y/2 || -@+/2 
2np/2T,(n/2) 
exp{—tr(E7'Ss:)/2}|Sx|-? “V7 


2np/2|33|"/2T7,(n/2) 


Proof 3 (Anderson, 2003, Theorem 7.2.2) Anderson used an alternative two-step 
transformation T + T* = BT > Ss, = T*T*?. The Jacobian J(T > T*) is 
given by the first result of Lemma 3 while J(T* — Ss) is given by the reciprocal 
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of the last result in Lemma 2 when T = T*. That is, 


wp(Ss|k,n) = fp(T)J(T - T*)J(T* > Ss) 
exp{—tr(TT™)/2} ll ae 


at 


7 HrvP)-PF(n/3) (I Tet bi) * (2° ate = ~ 
exp{—tr(TT™) /2} i (t*,/bis)”* exp{—tr(TTT)/2} I ia 


2nP/2T,(n/2) ( int bi.) = i 7 2np/? (TTp-1 bf) p(n /2) 


7 exp{—tr(E7! Ss.) /2}|Ss_|("-P-D/2 
PTET yn] 


Proof 4 Use Theorem 1 and Lemma 6 when %1 = I, and yg = B.Bi = 
¥1/2(y)/2)T = 3. Then, we have 


dpn(¥|0, &) 
wp(Sx|4,n) = wp(Szllp.) 3” vio, 1,) 
exp{—tr(Sx) /2}|Sx.|("-?-Y/? exp{—tr(YY TEA!) /2}/{(20)?"/?|d|"/2} 


anp/2T (1/2) exp{—tr(YYT)/2}/(2m)?"/? 
exp{—tr(~ Ss) /2}|Ss|("—-?-)/? 
Qnp/2|3)|"/2T,(n/2) - 


3 Remarks and Conclusion 


For the general correlated cases, four proofs are shown in Theorem 2. The 
one-step first proof uses f,(T) with J(T — Ss) given by Lemma 5, where 
Ss = BTT'B? with lower-triangular B and T is seen as a two-fold Bartlett 
(Cholesky) decomposition or a usual Bartlett (1933) Ss = BT(BT)™ in terms 
of lower-triangular BT. The two-step second proof uses f,(T) with J(T ~ S = 
TT™) and J(S — Ss = BSB?*) obtained by Lemmas 2 and 4, respectively. 
Anderson (2003)’s two-step third proof uses f,(T) with J(T — T* = BT) and 
J(T* > Ss) given by Lemmas 3 and 2, respectively. Among the four proofs, 
the first and fourth ones are relatively simple. The remaining two-step proofs 
seem to be comparable. It is found that in order to derive the final Jacobian by 
Proofs 2 and 3, Lemma 2 is firstly and secondly used, respectively. When only 
the pdf of S(= Sy 1,) is focused on, Proof 2 may be the simplest though the 
same result is immediately obtained from the pdf of Sy substituting & = I,. 
In each of the four proofs, f,(T) is used. Two derivations for f,(T) were 
shown. The first method using Lemma 1 is much simpler than that used by 
Anderson (2003) as detailed in Remark 1. The author believes that this sim- 
plification will reduce the difficulties frequently encountered when beginning 
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students/researchers master the Wishart density. Note that when the Wishart 
density for w,(S|I,,7) is given, f,(T) is obtained using J(S > T) in Lemma 2 
as easily as the transformation J(T — S), which is the reversed problem (see 
Bartlett, 1933; Muirhead, 1982, Theorem 3.2.14). 


Remark 2 Lemma 1 gave the justification of XX? = TTT with mutually inde- 
pendent normal t;; (p > 7 > j >) and chi-distributed t,;(¢ = 1,...,p). While the 
chi-square distribution of (TT™);; is obvious, the distribution of (TT™);; (i > j) 
is that of the product sum of p pairs of independent normals (the product-sum 
normal for short). The pdf and mgf of the product-sum normal in the case of a 
possibly correlated single pair was given by Craig (1936) (see also Ogasawara, 
2023a, Remarks $1-S4). For current developments of this issue, see e.g., Seijas- 
Macias and Oliveira (2012), Seijas-Macifas, Oliveira, Oliveira, and Leiva (2020), 
and Gaunt (2022). 


Remark 3 As addressed earlier, the complicated property found in many of 
the proofs of the Wishart density seems to be due partially to the associated 
Jacobians in e.g., Srivastava and Khatri (1979, Section 3.2) and Anderson (2003, 
Section 7.2). The proof of the Wishart density in Theorem 1 is similar to that in 
Srivastava and Khatri (1979, Section 3.2).Though the Jacobian in Lemma 2 was 
also used by Srivastava and Khatri, we did not use the Jacobian of X > {T, V*} 
in X = TV*, where V* is a p x n semi-orthonormal matrix with V*V*T = I, 
(see Srivastava & Khatri, 1979, Exercise 1.33). Instead, we used the marginal 
chi and normal distributions for T as in Anderson (2003). 

As shown earlier, in the three proofs of the Wishart density w,(S»|X, 7), 
the Bartlett-like Cholesky decomposition © = BB" is used for non-stochastic 
&. Though this factorization gives simple results, other factorizations can also 
be used with © = BGG™TB™ = BG(BG)' = DD’, where GGT = GTG =I, 
and D = BG. For illustration, Proof 5 using D = }1/? with (1/2)? = © will be 
shown in Appendix A for didactic purposes with associated remarks. The concise 
derivation of Khatri (1963) will be explained in Appendix B. The Bartlett de- 
composition S = T'T™ can also be replaced by other ones with the same number 
of random variables. The case called the exchanged Bartlett decomposition will 
be shown in Appendix C. 


Conclusion Among Proofs 1 to 4 of the Wishart distribution given earlier and 
Proofs 5 to 7 to be shown in the appendix for expository purposes, Proof 4 
using our Lemma 1 for the equivalence of the distributions of the product-sum 
normal and the product of the chi and standard normal as well as Lemma 6 
for the factorization theorem given by Ghosh and Sinha (2002) is the simplest. 
Since Proof 4 uses elementary and self-contained methods, the proof may be 
understood by beginning students/researchers without much difficulty. 
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Appendix A An alternative proof of the Wishart density 
for correlated cases 


Let }1/? be a symmetric matrix-square-root of © satisfying (H!/?)? = ©. Then, 
we have (Y).; = (1/?X).; ~ N,(0,Z) as (BX).; ~ N,(0,¥) (i = 1,...,n), 
which gives Sy = YY? = }1/?XXTY!/? = siagyi/2. where Sy is rede- 
fined using 51/2. Let ss = (ssi, 8x21, $22, ---; SHp1,---) SEpp)) using redefined 
Ss = {ssij} (i,j = 1,...,p). Then, D,sy = vec(Ss) follows, where D, of full 
column rank is the p? x {p(p+1)/2} duplication matrix consisting of 0’s and 1’s 
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(Magnus & Neudecker, 1999, Chapter 3, Section 8); and vec(-) is the vectorizing 
operator stacking the columns of a matrix in parentheses sequentially with the 
first column on the top. Using the formula vec(ABC) = (CT @ A)vec(B) (see 
Magnus & Neudecker, 1999, Chapter 2, Theorem 2), where @ denotes the direct 
or Kronecker product, we obtain 


D,s=: = vec(S) = vec(S1/?SE1/?) = (51/2@5/?)vec(S) = (E/2@D1/?)D,s. 


Pre-multiplying the above equation by (D}D,)~'D} = De which is the left- 
or Moore-Penrose generalized inverse of D, with D} D, = I,(p+1)/2 (see Magnus 


& Neudecker, 1999, Chapter 3, Section 8), we have 
sy = Di (1? @ D/?)D,s. 


The Jacobian of the transformation Ss — S or equivalently sy — s is given 
by [Df (=? @ E1/?)D, |, = |X|@+)/?, which is derived using the following 


lemma. 


Lemma 7 (Magnus & Neudecker, 1986, Equation (7.11)). Let A be a px p 
positive definite matrix with distinct eigenvalues. Then, |D}(A @ A)D,| = 
JAP. 


Proof. While Magnus and Neudecker (1986) used Shur’s theorem for the ex- 
istence of a non-singular matrix V satisfying V-'AV = M, where M is an 
upper-triangular matrix for a general square matrix A, we use a familiar special 
case of the theorem as L'AL = A when A = LAL? with LLT = LTL = I, 
and A = diag(Ai,...,Ap) (Ar > ... > Ap > 0), where the columns of L and 
Ai(i = 1,...,p) are the eigenvectors and eigenvalues of A, respectively. Note that 


Dj (LT @Lt)D,D;(A @ A)D,D;(L@ L)D, 
=D} (L* @L")(A@A)(L@L)D, 


= Di {(LTAL) @ (LTAL)}D, = D#(A@ A)Dy, 


where D,»,D}(A ® A) = (A @ A)D,D} and D,DJD, = D, (Magnus & 
Neudecker, 1999, Chapter 3, Theorem 13) are used, followed by the transfor- 
mation given by (A ® B)(C ®@ D) = (AC) ® (BD) when multiplications are 
defined. 

Note that D}(LT @ L™)D, = {D}(L@L)D,}~* since 
Dj (L* ®L")D,Di (L@L)D, = D} (L*@L")(L@L)D, = DID, = Ip41y/2- 


Consequently, we can write as 


D+(L? @LT)D,D*(A @ A)D,Dt(L @ L)D, 
=B-'!Dt(A @A)D,B = Di (A@ A)Dy, 
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which shows that the eigenvalues of DJ (A @ A)D, are the same as those of 
D} (A @ A)D (see e.g., Magnus & Neudecker, 1999, Chapter 1, Theorem 5). 
Employ the double subscript notation as used earlier for the row numbers 7 
and j (p > i > j > 1) and column numbers & and | (p > k > 1 > 1) of 
the {p(p + 1)/2} x {p(p + 1)/2} matrix D(A @ A)D,. These numbers cor- 
respond to the subscripts of the elements of e.g., the {p(p + 1)/2} x 1 vector 
s= (si, S215 5225 +++) Spl, ery Seal 

Consider (A ®A)D,, where the (k, &)th columns of (A® A)D, (k = 1,...,p) 
are unchanged from the corresponding ones of A ® A while the (k,/)th columns 
(p >k>I1> 1) of (A @ A)D, are combined ones as the sum of the (k,/)- and 
(1, &)-th columns of A @ A such that e.g., 


100 20 0 
010 0 ALA2 0 
— di 2 2 _ 1A2 
(A & A)D2 = diag(\j, Al r2, dQ M1, N53) 010 = 0 . Mii 0 
001 0 0 2% 
when p = 2. For the second transformation D} (A @ A)D,, noting that DJ = 
10 0 0 
(D> D,)~'D> consists of 1’s, 1/2’s and 0’s as Di = | 01/2 1/2 0 |, we find 
00 0 1 


that Di (A @ A)D, is the {p(p + 1)/2} x {p(p + 1)/2} diagonal matrix whose 
diagonal elements are \? (¢ = 1,...,p) and \;A; (p > i > Jj > 1) as DZ(A@ 
A)Dz = diag(A?, A241, \3). Then, we have 


|D; (A @ A)D,| = |D;(A® A)D,| 


Pp Pp p+l 
t=1 t=1 


p>i>j>l 


Proof 5 of the Wishart density in Theorem 2 The Jacobian of the trans- 
formation Sy; > S or equivalently sy — s is given by Lemma 7 as |D} (=1/? @ 
=1/2)D,|4 = |D|%*)/2. Consequently, J(s + sy) becomes |E|~+)/?. Then, 
the pdf of Sy is obtained by that of S = ©~!/2Sy7!/? in Theorem 1 and 
J(s 3 sy) = |S|-@+)? as 


—tr(S) /2}|S|(-P-1)/2 
w,(Spl¥,n) = “PL r( )/ } | |S|-@+1)/2 
anv/2 DP, (n/2) 
exp{—tr(S—1/28,, 5-1/2) /2) |S 1/26, -1/2|(0-P-1)/2 || -@+)/2 
7 gnp/2T(n/2) 
_ exp{—tr(E-!8y)/2}1Sy|("-P-Y 2 


2np/2|33|"/2T(n/2) 
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Remark 4 When Lemma 7 for the Jacobian of Sy — S is given, Theorem 2 
for the Wishart density for general correlated cases was immediately obtained. 
Conversely, when the Wishart densities for S and Sy are available, the Jacobian 
is easily given by comparing two densities using S = ©~!/2Sy~!/2, which was 
employed by Anderson (2003, Theorem 7.3.3). 


Appendix B= On Khatri (1963)’s self-contained concise 
derivation 


Khatri (1963) is referred to only by Kshirsagar (1972, p. 59) and, Srivastava and 
Khatri (1979, p. 76) to the author’s knowledge. The derivation depends on the 
integral 1*/2q(*/2)-1 /T'(k/2) = Jit xq 11 +++ dx, where q is a positive constant 
and 2,’s with x = (#1,...,x,)" independently follow the standard normal. This 
integral is typically obtained in a proof of the chi-square distribution with k df 
using the surface area S,_1 = 2a*/?r*—-!/T'(k/2) of the (k — 1)-sphere with the 
radius r = q'/? in the k-dimensional Euclidian space and dr = {1/(2q'/?)}dq: 


{Tis (1/V2m) exp(—1?/2)heracca } Seren dt1 +d 


1 Qk/2pk—-] dp 1 Ogee 4 
= ow (-$ ee a 
(27)*/ T(k/2) dq (an)*/ T(k/2) 2g 
1 
= (k/2)-1 _4 
2/27 (k/2)4 exp (—$), 
yielding 
k/2,(k—-1)/2 k/2(k/2)-1 
i eee Y/2 4 las al 
xTx=q I(k/2) 2qi/? DP(k/2) 


Khatri (1963, p. 53) stated that fir, da1---da, = */2 gk/? /T(k/2) using our 
notation, where q*/? rather than q‘*/2)—! is probably a typo since otherwise the 
correct factor |S|(—?-)/? corresponding to q‘*/2)-! when k = n—p-+1 in his 
subsequent expression of the Wishart density does not follow. An alternative 
short derivation of {..,_, da1--- da, was given by Ogasawara (2022) as follows. 
Suppose that the pdf of the chi-square with k df, which is equal to that of the 
gamma with the shape parameter k/2 and the scale parameter 2, is obtained 
by a different method using e.g., the property of the distribution that the sum 
of the independent gamma distributed variables with the same scale parame- 
ter becomes the gamma with the shape parameter being the sum of those of 
the gammas and the same scale. Note that the beta integral or the moment 
generating function can be used for the derivation of this property. Then, we 
have 


k 


(k/2)-1 exp(— 
{II (1/V2r) exp(—t /)hera} | a da, ---da, = 4 sana 
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which gives 


| dips so cdang = 1 xa) 2)2MAL(R/2)} _ a Ag(e/-1 
xg TI G/V 2m exp(—2?/2)lereaq EP (H/2) 


We find that this derivation without using the area of the (k—1)-sphere is similar 
to that by Anderson (2003) mentioned in Remark 4. 


Proof 6 of the Wishart density in Theorem 2 (Khatri, 1963) Khatri’s 
1.5-page short derivation is due partially to his concise description. Since the ar- 
ticle is less well documented with no title, the citations mentioned earlier using 
the same incorrect page numbers and several possible typos including the above 
one for important points and other minor errors, the corrected proof is pro- 
vided with some added explanations. The derivation consists of a p-step variable 
transformation with p Jacobians canceling most of them after multiplication. 
Define the pxn matrix X 5, where each column independently follows N,(0, 5). 
T 
Partition Ss = XyXFE = Sp-1 Spa = Xp-1Xp-1 Xp-1%p , where e.g., 
S54 Spp Xp Xp-1 Xp Xp 
Spp is temporarily used in place of ssp, for simplicity. Define the n x n matrix 


Xp-1 

P= . , where the (n — p+ 1) x n submatrix Y,_»+1 is chosen such 
Yn—p+i 

that Yep = O and Ynep ¥ 5 4 = I1,_-p41. Then, we have P,P! = 


S,-1 O 
, which gives |P,|4 = |PnPZ|1/?2 = |S,-1|'/?. Consider the 


O Th-p4i 
variable transformation from xp to P,x, with (s)_1, Z;—p41)' = PnXp, where 
— —1 a3 . 
Zn—p+1 = Yn—p41Xp and J(Xp 4 Pnxp) = |Palz* = |Sp_1|-1/?. Since 


= gel fel T T-1p—-1/.T a i T 
Spp = Xp Xp = (85-1) Zt) Ep P; (85-1) Zn—p+1) 
—1 
T Si-1 O Sp-1 


= et -1 T 
= $p—19p_—1Sp-1 + 2p p41Zn—pt1) 
O Tn-p+i Zn—p+1 


we have ae es = Spp — si 85+ )Sp-1 = |Sy|/|Sp-11. 
Using the multivariate normal density, the joint marginal density of Xp_1, 
when a random matrix Sy at S»p is a fixed one, becomes 
Ping Bp) = a 
= 2 7 = 
= fore [28, (2m) 7 |BI-"/? exp{—tr(B-18 p)/2} 
xJI{Xp (s34, BP p41)' }dz ++ d2n—p41 


= f°, (2m)? |b |-"/? exp{—tr(B-1S y)/2} |Sp—al-/7dan—p41, 
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where the integrand does not include Zn_p+1. Then, the above integral becomes 


fx. = (20)-"??|D[-"/? exp{—tr(E7!S x) /2} |Sp-1|77/? 


x dZn— 
Di son sic ie maptl 


= (2r)-"P/?|S|-"/? exp{—tr(E-1Sx)/2} |Sp_i|-"/” 
gq (n—pt1)/2 (\Ss|/|Sp—1|) (PtP? 


I'{(n — p+ 1)/2} 
se dale lc ae a exp{—tr(7'Ss;)/2} 1 


(2n)"?/?|3["/2E-{(n — p+ 1)/2} |Sp—1|-P)/2” 


where Khatri’s (p. 54) expression |S,—1|(n —p—2)/2 using our notation in place 
of |S,_1|(-”)/? is incorrect. Define Xp_i{(p— 7) x n}, Yn—pri{(n—p +4) x nh, 
Sp-i{(p—2) x (p—t)}, Sp—i{(p—2) x 1} and Zn—p4i{(n—pti) x 1} (i = 2,...,p—1) 
similarly to those when i = 1, respectively. Then, using these matrices and 
vectors in similar manners, we have the successive transformations as 


(n—P+1)/2)§ 5 |(n—P-1)/2 exp{—tr(E-!S 5) /2} : 
fx, = (2n)"?/?|BPP2PL(n — pt 1) /2} |S,_1|—P)/2 
Ma a 44 2n— p+i=|Sp—i+1/|Sp- | T2n—p4i 
ae te are jn P~1)/2 exp{—tr(E~ Sy) /2} ! 
= (2r)"?/?|D|"/204 (n — p+ 1)/2} [Sp—1|%-?)/2 
p—1 (n—pti)/2 |S, apy | oP es, Aes ae 


I{(n— p+ i)/2} 
ine p)(p—1)+{p(p—-1)/2} nP]/2)S |(n— p- 1)/? exp{— tr(D *S2)/2} g (n-2y/2 
gnp/2|y5\n/2 TPP) P{(n — p + 1)/2} 
ISso|(-P-D)/? exp{—tr(E-1S 5) /2} (Xk 
~ Qnp/2]S|n/2xr-D/AT PP, P{(n—p+i/2) 2/2 /T(n/2) 


Noting that (Ky X?)-(°-2)/2 = |Sy|-(-2)/2 = 5)"")? is a fixed quantity, the 
last step is the integral with respect to the row vector Xj: 


wp(SzlE,n) = fxy Sx, xpae, EX = Fe, wl2g(/2)-1 1 7(/9) 
7 |Ss.|("-P-D/? exp{—tr(E—!Ss)/2} 
© Qnp/2|d|n/2qple-V/4 TE P{(n — p+ 4)/2} 


Appendix C The exchanged Bartlett decomposition 


The Bartlett decomposition S = TT™ has been used in this paper as well as 
in literatures. Let S = UU™, where U(4 T") is the upper-triangular matrix 


56 H. Ogasawara 


whose non-zero elements are random variables. Note that U can be obtained 
by rotating T as U = TV using an orthonormal matrix V. Define the upper- 
triangular matrix C satisfying © = CC™ with c; > 0 (i = 1,...,p), where C is 
obtained by C = BV* and V* is another orthonormal matrix. Recall that the 
Cholesky decomposition © = BB™ was used earlier. The form © = CCT is also 
called the exchanged (reversed) Cholesky or upper-lower (UL) decomposition in 
this paper. 


Remark 5 Consider the distribution of wij;(¢ = 1,...,p;7 = i,...,p), which are 
assumed to be mutually independent. As in the case of the usual Bartlett, Lemma 
1 shows that when u,; is chi-distributed with n — p+ i df (i = 1,...,p) and uj; 
is standard normal (i = 1,...,p;7 =i+1,...,p), the distribution of S = KXT(= 
TT") is the same as that of UU". Note that t,; is chi-distributed with n—i+1 
df rather than n — p+i. The joint pdf of U denoted by f,(U) becomes 


aa 


It Q{(n—pt4)/2}-1 Pf (7 — p + i) /2} 


I] exp (—u?,/2) 


il 
2p) /2 
( [Za ? p)/ 1<i<j<p 


{Ehret exp(—u8,/2)} I] osn(-1/)} 


fp(U) = 


P unt! exp(—u?./2) | 
x 


1<i<j<p 


= et = Pp 
9% =pip 4 pipet) Ps ge 1) qe 1) a P{(n —p ah. i)/2} 
i=1 


(11 west) exp{—tr(UU*)/2} 
2° -PT,(n/2) , 


Proof 7 of the Wishart density in Theorem 2 Consider the one-step trans- 
formation from U to Sy = CXXTCT = CSCT = CUU'C?, where it is found 


that C(X).,; cs N,(0, &) (j = 1,...,). Redefine the vector of the non-duplicated 
elements in Sy as Sy = (Sy11,--., SE1p, $022, +++, SHO, S5up) whose elements 
are lexicographically ordered Similarly, define the {p(p+1)/2} x 1 vectors c and 
u using the corresponding elements of C and U, respectively. 

The proof is similar to Proof 1 of Lemma 5. Since C, U and CU are upper- 
triangular, the Jacobian matrix Oss /Ou? = {Osxij/Oun} A <i<j<pl< 
k <1 <p) becomes upper-triangular, whose diagonal elements are 


ee = {C(E,jU? + UE;i)C*}i3 = (CEU C*) ig = civuyjej3 (LS i<j <p) 


Ou; 


and 
OS di 
Oui 


= {C(EgU? + UEy)C™} az = 2ciuus (6 = 1,...,p). 
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Since the determinant of the Jacobian matrix or J(Ss — U) becomes 


Pp Pp Osxig _ Pp Pp O8 545 Pp Ossiis _ 9p TTP Pp einige 
i=1 l=: Buijs ( i=l [paisa Dui; (tus 2? That l=: CHU ij CHG 
— 9p Pp .p-itl P JI J — 9p TTP ppt, i _ opiqipt1 TP i 
=2 ( j=1 ti j=l uch; = 2° [Gi Cig Ui, = 2P|C| [jan Vis 
— 9P (p+1)/2 T7P i 
= 2?|3)| TTj-1 & 


1? 


J(U — Ss) is given by the reciprocal of the above quantity. 
The Wishart density is given by f,(U) and J(U > Sy): 


Wp(Ss|¥,n) = fp(U)J(U > Ss) 
p Lede pe 
exp{—tr UU" 2r i une |[-@+/2 
2(np/2)—P IT, (n/2) 2? TT Ui 
Pp 
exp{—tr(UUT) /2}|E|-@+D/? TT ure? 
i=l 


2nv/2T,(n/2) 
exp{—tr(C7!S8s,CT~ 1!) /2}|5|-@+Y/2|C-185,.CT 1H (—-p-D/2 

7 2ne/2T7,(n/2) 
exp{—tr(7!Ss:)/2}|Sy|%-P-D/? 

° 2ne/2|D]"/2I,(n/2) 


as expected. 


Remark 6 Though U ¢ T™ as noted earlier, U is obtained by reversing the 
row indexes of T followed by the similar reversal of the column ones. When p = 
3, this transformation proceeds as 


11 0 O t31 t32 t33 33 t32 t31 U11 U12 U13 
0 Baga ta1 too 0 — | to, tee 0 > 10 to2 to, = 1/0 U22 U23 i Be 
31 t32 t33 ty, 0 O 0 O ty 0 O- us 


The above example indicates other decompositions S = T*T*T = U*U*? with 
the unchanged distribution of S = XX, where T*(U*) is a lower (upper)- 
triangular matrix defined with the non-zero elements on and below (above) the 
minor diagonals. Note that T* and U* are obtained by T and U by revers- 


0 0 ti 
ing the row or column indexes. When p = 3, T* and U* are |0 tag to, | = 
t33 t32 t31 
0 0 t33 U13 U12 U1 Uj, Uig U3 
O ty tog | and | ue3 wag 0 = | uz, Uso 0 |, respectively. 
34 to t33 U33 0 0 Us) 0 0 


Actually, we have infinitely many transformations with the unchanged distri- 
bution of S, including the above ones, using various orthonormal p x p matrices 
denoted by V’s since each column of VX independently follows N,(0,I,) (see 
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e.g., Anderson, 2003, Theorem 3.3.1). In other words, the distributions of VX 
and X are the same. Then, S = X XT can be replaced by S = VXXTV™. Note 
that one of the decomposed matrices e.g., T, T*, U and U* are given by other 
ones using V as T = VU*. This indeterminacy of transformation is similar to 
the rotational indeterminacy in orthogonal rotation in factor analysis and canon- 
ical correlation analysis or more generally transformations in structural equation 
modeling (Ogasawara, 2007; Schuberth, 2021; Yu, Schuberth, & Henseler, 2023). 


